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To paint a broad though much simplified picture, 
let us suppose at the outset that scholarship begins 
with the collection of facts. These facts are of two 
distinct kinds. The first are observations and they 
consist, for example, of the results of controlled ex¬ 
periments or observations for field work in the case 
of science or, perhaps, they are derived from the 
study of historical documents in the case of history, 
and so on. The second kind of facts are the reported 
observations, descriptions of phenomena or events, 
or the theories provided by contemporary scholars. 
In aggregate, let us refer to the first kind of facts as 
“data” and the second as “information.” From the 
confluence of these two kinds of facts in the mind 
of the scholar, new descriptions and theories are 
born. When he makes these public, then new infor¬ 
mation is generated. 

Scholarship, strictly conceived, is this activity in 
the mind of the scholar. On its right hand are 
sources: data and information. On its left are publi¬ 


cations: the products of this activity made public. 
But these two sides of scholarship are closely relat¬ 
ed. What to one scholar is a publication, to another 
is information. Every scholar stands both to the right 
and to the left of every other one. 

In our text processing work at the University of 
Pittsburgh, we look upon our computers and our 
developing system of programs as a tool designed to 
extend the abilities of the scholar, on the one hand 
to collect, sort, and understand information, and on 
the other to disseminate to others the information 
that he generates. In other projects and for most of 
the users at our Center, our computers and systems 
of programs are seen as a tool to extend the ability 
to process and analyze data. These systems are, of 
course, well developed. In analyzing data, one’s 
concern is to reduce, to simplify, and to summarize, 
preserving only the most significant aspects of the 
data. While in processing information, we wish to 
preserve every jot and tittle, allowing no character- 
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istic of any significance to go unrecorded or un¬ 
transmitted. Finally, in the research we ourselves do 
that utilizes natural language text, we come full cir¬ 
cle and again use our systems as data processors 
and analyzers, treating the information we have col¬ 
lected as data. 

Figure 1 shows schematically the overall design 
of bur text processing system. Four kinds of input 



Figure 1. Block diagram of the general text processing 
system. 


are shown. The text on magnetic tape in any arbi¬ 
trary format may be material obtained from other 
centers or from any source that produces text on 
tape. One day this source may include material read 
by optical character recognition equipment. The 
printer’s control tapes are paper tapes obtained 
from printers and publishers which were originally 
used to control some kind of typesetting equipment. 
We have locally constructed a paper tape reader that 
will accept 5,6,8,15, and 31-channel paper tape 
and, through an IBM 1401 computer, write magnet¬ 
ic tape. This work was completed under a Depart- 
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ment of Defense Advanced Research Projects Agen¬ 
cy grant and has been reported elsewhere. 1 ' 2 The 
text punched on cards or on Flexowriter-type paper 
tape would normally represent material prepared at 
our Center. 

The block labeled “conversion to standard mag¬ 
netic tape” represents the encoding of all forms of 
natural language text into a particular format ac¬ 
cording to a schema devised by Martin Kay and 
Ted Ziehe of the Rand Linguistics Research Group. 
A relatively complete, but still preliminary descrip¬ 
tion of this format has been published as a Rand 
Memo. 3 The use of magnetic tape for storage of text 
and the use of this standard format are prominent 
in our system and more will be said about this in a 
moment. 

Some source text in exceptionally good condition 
may, after encoding in this standard form, be ready 
for distribution to other centers requesting it or for 
use in our own research. Characteristically, how¬ 
ever, some additional processing will be required 
and this is represented in the block labeled “utili¬ 
ty.” At the bottom of this figure, our use of text as 
data is represented. Under “in-house analysis” we 
have listed information retrieval research, auto-ab¬ 
stracting, and content analysis as examples of this 
kind of work. 

The series of blocks down the right side of Fig. 1 
show the normal sequence of operations for photo¬ 
composition. Material to be photocomposed will, in 
most cases, be specifically keyboarded for that pur¬ 
pose. This material will be under good control from 
the beginning and can go directly into the typeset¬ 
ting system unless it will be used for other purposes 
as well. Sorting, editing, and other processing will 
generally not be required so that the conversion to 
standard format can be bypassed. Both kinds of in¬ 
put to the typesetting system are allowed. An ex¬ 
panded block diagram of the typesetting system it¬ 
self will be shown in a later figure. 

Our system depends to a large extent on the effi¬ 
cient processing of large amounts of natural lan¬ 
guage text on magnetic tape and this aspect of our 
system will be described in somewhat greater detail. 
Magnetic tape is, of course, an economic storage 
medium and is easily shipped between geographi¬ 
cally separated centers. Encoding all text in one 
standard format becomes important when many dif¬ 
ferent kinds of text from many different sources 
must be processed and shared. When standardized 
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input can be expected, a smaller number of general 
programs can be written and a useful library can 
begin to be accumulated. 1 ne standard adopted 
must be flexible enough to handle any material one 
may encounter. The Rand format seems to fill all of 
our current and anticipated requirements and we 
have adopted it for our system. 

On seven-channel magnetic tape, the minimum 
unit is a six-bit pattern plus a parity bit. In a one- 
to-one character representation, only 64 unique 
characters can be defined. In order to extend the 
number of different characters that can be repre¬ 
sented on tape, either more than one six-bit pattern 
can be assigned to each character to be represented 
or else, as in the Rand standard format, some of the 
available 64 patterns can be used to change the 
meaning of the patterns that follow them on tape. 
These mode change patterns or characters are of 
two kinds: “flags” and “shifts.” The flags change 
the interpretation of succeeding patterns to a new 
alphabet, while the shifts retain the same alphabet, 
but mediate changes to, for example, upper case, 
italics, larger type size, and so on. 

Fifteen of the available 64 patterns are perma¬ 
nently assigned as alphabet flags in the Rand sys¬ 
tem. These 15 patterns along with the blank (octal 
60) and a filler character (octal 77) are not a part 
of any alphabet and their interpretation never 
changes. There are, then, 47 patterns which can be 
assigned meanings in each of the 15 alphabets. In 
each of the 15 alphabets, some of the available 47 
patterns will be assigned mode change functions as 
shift characters. In the Roman alphabet, for exam¬ 
ple, nine patterns are used in this way. The remain¬ 
ing 38 patterns can accommodate the 26 letters 10 
diacritic marks, and the apostrophe with one pat¬ 
tern left unassigned. Notice that separate alphabets 
must be used for punctuation, the numerals, and 
other symbols occurring frequently in the English 

This encoding system gives a flexible representa- 
" of the micro-characteristics of text. Larger 
units of text, however, have a hierarchical organiza¬ 
tion which also requires representation. This is ac¬ 
complished in the Rand system by the “catalog” 
ormat. The fundamental unit in this system is the 
Jtum which can be thought of as a manipulate 
Un t of '"formation. A datum may be a text entry 
Consisting °f one physical line of text if from a pre- 
1 Us > printed source, or one sentence, or one word 
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if that is convenient, or it may be a title or a cap¬ 
tion from an illustration, or an annotation or de¬ 
scription of another datum added at a later time. 
Each datum belongs to a particular class and at the 
beginning of each reel of tape following a label rec¬ 
ord, a map of the corpus is given describing the var¬ 
ious classes of material contained in the file. Each 
datum is coordinated with this map and its proper 
identification assured by a system of control and 
label words accompanying every datum. A represen¬ 
tation of the Rand encoding system will be shown 
later in our second typesetting example. 

We in Pittsburgh became interested in automatic 
photocomposition when, in October of 1964, we 
acquired a Photon S-560 photocomposition machine 
from the National Institutes of Health. This ma¬ 
chine had previously been used by Michael Barnett 
at the Massachusetts Institute of Technology under 
an NIH grant. The Photon is an electromechanical 
device driven by punched paper tape. It consists es¬ 
sentially of a movable glass disk with 1400 charac¬ 
ters etched on it and a lens system for projecting 
these characters onto roll film. The disk can accom¬ 
modate 16 different type fonts arranged in eight 
concentric circles or levels around the disk. The pa¬ 
per tape is punched with double character codes, the 
first giving the character position within disk level 
and the second giving the escapement for that char¬ 
acter. There are additional codes for advancing the 
film, positioning the film carriage horizontally, af¬ 
fecting lens shifts for size control, and effecting 
shifts to new disk levels for font changes. 

When wereceived the Photon, we also acquired 
the PC6 system of automatic photocomposition 
programs developed under the direction of Barnett 
while he was at M.I.TA 8 The PC6 system is typi¬ 
fied by the TYPRINT program which requires text 
containing fixed typesetting control codes as input. 
These codes are set off from the text by square 
brackets, which are reserved, and have fixed mean- 
ings as shown in the following examples: 

[NP ] New Paragraph 

[DL6] Shift to Disk Level 6 (Highland type face) 
[VL2] Leave 2 Blank Lines 

In using this system, we soon found that the inser¬ 
tion of fixed codes can be laborious, that changes in 
format require changes throughout the text, and 
that many desirable formats are impossible to 
achieve. We felt that a more flexible and more gen- 
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erally useful system of programs could be written. represents a real departure from the PC6 system 

We still believe, however, that the PC system was and other typesetting systems we have seen. In this 

a successful first step toward automatic photocom- program, a full page of text is set before outputting 


position and, in general, the typesetting system we 
have developed is an outgrowth of our experiences 

with it. 

The input to our system is either magnetic tape 
in an arbitrary format produced from paper tape 
punched specifically for typesetting or else magnetic 
tape in the Rand standard format. The output is 
again paper tape that will drive the Photon. A sche^ 
matic diagram of this system is shown in Fig. 2. In 
this figure, the two forms of magnetic tape input 



Figure 2. Block diagram of the typesetting system. 


are shown at the top. The typesetting program is 
shown as two separable functions. The first part, 
which translates text into the double character Pho¬ 
ton code, is relatively independent of the second 
part, but is quite dependent on the particular photo¬ 
composition device being used, that is, on the Pho¬ 
ton. This part would be largely rewritten if a new 
piece of equipment were obtained. It is, however, a 
rather simple and straightforward program. The 
second part, labeled the “page formatting program. 


.. is begun. 

The page formatting program shows two forms of 
output. The first is a magnetic tape which contains 
Photon input that will be converted to paper tape. 
The other form of output labeled the “history tape,” 
is a magnetic tape containing the original text char¬ 
acters with their associated Photon codes, all of the 
material added by the page formatting program, 
page and line numbers, and sufficient parametric 
information to reset the material exactly as it was 
originally done. This tape can be recycled through 
the page formatting program with corrections or 
additions to the text or simply with changed param¬ 
eters if the format is to be changed. Since page 
numbers, tables, captions for figures, titles and sub¬ 
titles, and so on are all in their proper place on this 
tape, it can be used as input to a program that pro¬ 
duces indices and tables of contents. Finally as 
shown, this tape might simply be stored for a peri¬ 
od of time and then recycled when a new edition is 
to be set. 

This history tape is an important by-product of 
computerized typesetting and may well be a critical 
factor in making the adoption of an automatic sys¬ 
tem economically feasible. This tape is essentially 
an exact copy of the printed material, less illustra¬ 
tions which cannot be handled in our system, and is 
a compact, machine-readable counterpart of the 
standing type that occupies space in some print 
shops and warehouses. Any material in this file can 
be simply addressed by page and line number from 
the corresponding printed document and changes 
made. If a change is made that affects the remain¬ 
der of the file, for example an insertion that affects 
the pagination, all of the file will automatically be 
corrected. 

In designing this system, we came to the conclu¬ 
sion that typesetting control codes in the text to be 
set are necessary if any format flexibility is to be 
obtained. They, therefore, appear minimally in our 
system. We have tried at the same time to ease the 
burden of keyboarding these codes and of changing 
their meaning in pre-prepared text by making them 
entirely arbitrary. The text-dependent codes can be 
though of simply as markers. The actions to be 
taken when particular codes are encountered are 
separately specified as parameters to the system. 
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These parameters can be inserted anywhere in the 
text ahead of the markers to which they refer, or 
they can be punched on parameter cards. If they are 
keyboarded with the text, they are normally marked 
off by dollar signs or some other specified reserved 
symbol. The form of the printd output can be com¬ 
pletely changed by changing these parameters with 
no re-editing of the text itself. 

In our system, we wished to include the ability to 
control as much as possible the layout and final 
form of the pages in the manuscript. We felt that 
the deficiencies of other systems in this respect 
stemmed from their line-by-line typesetting. The 
attempt to visualize a page by as yet undefined 
lines is difficult and usually leads to a number of 
unnecessary trial runs. To ease this difficulty on the 
programming level, we set full pages. On the con¬ 
ceptual level, we conceive of a page as a collection 
of subpages or “boxes.” A box is a string of fixed 
text delimited by two markers. The material within 
a box can be set independently of other material as 
though it were itself a page and then the box of 
fixed material placed in its proper position on the 
page. The box system is recursive so that boxes 
may be defined within boxes and for most func¬ 
tions, overlapping is allowed. 

The parameters used to control the system are of 
three types: (1) general parameters, (2) text boun¬ 
dary parameters, and (3) box parameters. A list of 
the general parameters is shown in Fig. 3. 

Most of these parameters control the general ap¬ 
pearance of the printed output. They include the 
specification of page size, number of columns on 
the page, type face, point size, and so on. The pa¬ 
rameters specifying running page headers include a 
provision for incorporating page numbers that are 
automatically incremented. The last two parameters 
are provided to make the keyboarding somewhat 
simpler. The DLlM code allows the specification of 
any character to mark off parameters when these 
are included in the text in place of the preset dollar 
sign. The DEL code allows any character to be 
specified as a deletion code. It causes a character 
over which it is typed to disappear from the input 
string. Only those parameters that are to be differ¬ 
ent from their preset values need be specified. 

The following list of general parameters: 

$ PSIZ(8. 5, 11), TFAC(SCOTCH), TSIZ(10), 
HEAD(Page /l/), COL(3.5, 1.5, 3.5) $ 
would specify 8 V 2 by 11 inch pages to be set in 


SYMBOL 

PS IZ (x , x) 

COL(x,x,x...) 

JUSV( 3 ) 

JUSH(s) 

TFAC(n,I,b,B) 

Fowr(f) 

TSIZ(p) 

BGND(p) 

TAB(x,x,x...) 

MWS(p,p) 

XWS(p,p) 

HEAD(t) 

LHEAD(t) 

RHEAD(t) 

DLIM(c) 

DSL(c) 


MEANING 

Page SIZe 
COLumns 

JUStIfication-Vertleal 
Justification-Horizontal 
Type FACe 
FONT 

Type SIZe 

Back GrouND size 

TAB 

Minimum Word Spacing 

maximum Word Spacing 

HEADer 

LeTt HEADer 

Right HEADer 

DeLIMiter 

DELetlon character 


NOTES 


Page size Is width 
by height. 

Column widths and 
margins alternate. 
Reserved words such 
as center,spread,etc. 
are used to indicate 
action desired. 
n,i,b,B are names 
of type fonts. 

U3ed to Indicate 
Italic or bold type. 
Type size is given 
in points. 

Background size is 
also in points. 

Tab, setting measured 
from left margin. 
Minimum distance 
between words. 

Maximum distance 
between words. 

Headers may be any 
string of text,it 
will be set on both 
pages. LHEAD and 
RHZAD are set on re¬ 
spective pages only. 
Used to surround In¬ 
structions In text. 
Removes unwanted char¬ 
acters when backspacing 


x is a dimension expressed in Inches. 


p Is a dimension expressed In points. 

t Is any string of text and may include any boundary or box 
markers. 

c is any keyboardable character and should be one which Is 
not normally used in the text. 


n,I,b,3 are the names of type faces available. They determine 
which type face will be used for normal,italic, bold-^ace and 
bold-face-italic letters. 


s may be one of the following reserved character strings 

CNT(CeNTer), LFT(LeFT), RGT(RIGhT), SPR(SPRead), BTM(BoTtoM), 

TOP. 

f may be one of the following NOR(NORmal), ITAL(ITALIC), BOLD, 
or BOLD-ITAL. 

Figure 3. List of general parameters. 


10 point Scotch with two 3 Vi inch wide columns 
separated by IV 2 inches. The running heads “Page 
1,” “Page 2,” and so on would print at the top of 
successive pages. Since the background size and the 
minimum and maximum word spacing were not 
specified, reasonable values for these would be com¬ 
puted by the program based on the type size and 
line length. Hyphenation would occur if the lines 
could not be justified within these computed limits. 

The general form of the text boundary parame¬ 
ters and the box parameters are shown in Figure 4. 
The text boundary parameters specify a particular 
arbitrary text marker and a list of actions that are 
to occur when that marker is encountered in the 
text. The box parameters specify two particular ar¬ 
bitrary text markers which will delimit fixed strings 
of text to be treated as a box and a list of actions 
describing the way material in the box is to be set 
and the placement of the box on the page. The lists 
of actions in each of these two parameters can in¬ 
clude any of the general parameters or any of the 
additional actions listed in this figure. 
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" G --‘ 3" the TEXT BOUNDARY PARAMETER: 

9 AT k)CS(; P, P,P, ., 


D <> 

* - V 


?0HM of the 30X PARAMETER: 

$ PROM k)CS(y to k)CS(;.- P,P,P, ...,p $ 

C3 is any arbitrary character string. 

«c is the letter 0 if the string between the closed and 
open parentheses sped Ting the text marker is in octal 
representation, otherwise it is blank. 

*. the marker is not also part of the text, it 

is o(Save) or SIN(Save IN box) if it is a part of the text 

P nay be any general parameter or any of the following. 


SYMBOL 

MEANING 

TA3(w) 

TAB 

SKIP(z) 

SKIP 

MAR(x,x) 

MARgin 

3SI2 (x,x) 

Box SIZe 

BTA3(w) 

Box TAB 

BSKIP(q) 

Box SKIP 

UNIT 

UNIT 


Allows indenting to 
a predefined tab. 
Allows Vertical 
spacIng. 

Allows margins to 
be reset. 

Specifies dimension 
of box. 

Determines horizontal 
position of box. 
Determines vertical 
position of box. 
Forces box to be put 
on one page, i.e ., not 
split. 


z iray be one of the following reserved symbols 
nL number of Lines), NC(New Column), nG(number 

N £i/ fev ' Pa 3 e )> nP(number of Pages), nl(number o 
nPT(number of PoinTs). 


NL(New Line), 
f Columns), 
Inches), 


c may be the same as z plus TOP 3TM(3oTtoM) 
which TTroans the box should be placed at too 
of the column or page. 


or CNT(CeNTer) 
bottom or center 


bo a number referring to the nth tab 
L r T(LePT ) , RGT(RiGhT) or CNT(CeKTer) 
oe even with the left, center, or r 
column or page. 


w mav 
BTAB) 
should 


defined plus (for 
which means box 
ight side of current 


x may be a number (inches) or T (depends on text). 

Figure 4. Form of the text boundary and box parameters. 


The text markers are defined in these two kinds 
of parameters as the binary coded decimal equiv¬ 
alent of the character string appearing between the 
close paren on the left and the open paren on the 
right. The octal equivalent of the six-bit binary 
character may also be placed between the paren¬ 
theses, in which case the letter “O” preceeds the 
marker specification. A marker may be any string 
of characters that will not be confounded with text 
material. They may themselves be a part of the text 
to be set. If this option is desired, the letter S or 
SIN (for Save or Save IN) is appended to the 
marker specification. If S is used, the characters 
making up the marker are considered to come be¬ 
fore the marker or outside the box. If SIN is used, 
they are considered to come after the marker or in¬ 
side the box. These conventions give some format 

control over material that has no keyboarded codes 
at all. - . 

As a first example of the operation of this system 
for a straightforward problem, we have taken a part 
of the Recent Publications on Computational 
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Linguistics section of The Finite String for June 
1965. This monthly newsletter is a publication of 
the Association for Machine Translation and Com¬ 
putational Linguistics and the short bibliography 
section has been photocomposed at our Center since 
April of this year. The procedure we actually use 
with this material differs somewhat from this de¬ 
scription because the text is keyboarded on a Dura 
Machine 10 at the Rand Corporation rather than at 
our Center. The differences, however, are minor. In 
Fig. 5, the Flexowriter hard copy is shown with the 
parameters appearing at the top of the page. Only 

$ TFAC(high ' high rrAL,CENT BOLD), TSIZ (8), 


;> FROM 
-> FROM 

i > FROM 

ii FROM 


t,( TO )]( FONT (BOLD) $ 
( TO )/( FONT (ITAL) $ 


)/ 


••Computational linguistics: Glossaries** 

A ’^ " 0n the Dictionary Preparation," 

^ e ?^, a ^ he i96S? ‘ 

*ENT [Lehmann, W.P., and Pendergraft, E.D.l 
/Quarterly Progress Report,/ 

1 n 64 " 31 Januai *y l96s * LRC 65 NSF-23, 

Linguistics Research Center, The University of Texas. 

Austin, Texas, January 1965. *ENT 

Pfe ! c ° to k njapanese " En S 1 ish Translation Regarded as 
Sentence Generation, manuscript, presented at the 

U.S.-Japan Seminar on Mechanical Translation, New York, m y 1965. *ENT 
^■» M.I.] "II 3 impozium po nashinnomu perevodu" 

/Mechanical Translation,/ Vol. 8 , No.2lpS5^r? 1965)^.2-8. *ENT 

Tex 7 ts^!Sn , « ! TO ? h i yUkl] nprocedure for the Analysis of Japanese 
^nyscript, presented at the U.S.-Japan Seminar on 
Mechanical Translation, New York, May 1965. *ENT 

ENT [Satterthwait, Arnold C.] "Sentence-for-Sentence 

vS nS 8 at Jo n p Exam P le ^" /Mechanical Translation,/ 

Vol. o. No.2 (February 1965), pp. 14 - 38 . *EI7T 

*c.WT [Tosh, L.W.J "Development of Automatic Grammirs,” 

/Linguistics,/ No. 12 (March 1965), pp. 49-6oY*SOT 

Figure 5. Parameters and text for the Finite String example. 


general and box parameters are required. The gen¬ 
eral parameters set the page size to 6% by 10 
inches, the normal type face to Highland, the italic 
type face to Highland Italic, the gold type face to 
Century Bold, the type size to 8 points, and the 
maximum word spacing to 100 points (to preclude 
hyphenation). Since the background size and mini¬ 
mum word spacing are not specified, computed val¬ 
ues will be used. Four boxes are defined. The first 
encloses subtitles which are spaced three lines be¬ 
low preceeding material and printed in bold face 
and somewhat larger type size. The second encloses 
whole bibliographic entries. The associated actions 
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cause each entry to be treated as a unit, not to be 
slit between pages, and a line space is left between 
them. The third encloses the author’s name which is 
to be set in bold face and the last encloses the title 


of the publication which is to be set in italics. The 
photocomposed result is shown in Fig. 6. 

This first example was shown to illustrate the 
simplicity of the system when limited format con- 
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Reitz, Gerhard (ed.) Improved Syntactic Flowcharts 
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- Research Output Format, Progress Report No. 9, The Bunker-Ramo Corporation, 


Sakai, Toshiyuki “Procedure for the Analysis of Japanese Texts,” manuscript, presented at the 
New York, May 1965. 


U.S.-Japan Seminar on Mechanical Translation, 


Satterthwait, Arnold C. “Sentence-for-Sentence Translation: An Example,” Mechanical Translation , Vol. 8, No. 2 (February 1965), pp. 14-38. 
Tosh, L. W. “Development of Automatic Grammars,” Linguistics, No. 12 (March 1965), pp. 49-60. 


Figure 6. The Finite String bibliography. 


trol is required. Our second example is intended to 
show a wider range of the possibilities inherent in 
the system and in particular, the degree of format 
control that can be obtained. This example consists 
of the first three pages of an eight-page booklet on 
postpartum care prepared for the University-affili¬ 
ated Magee-Womens Hospital in Pittsburgh. After 
the first two pages, the booklet has a two-column 
format with captioned illustrations and the author 
had an exact picture in mind of the way in which 
each page was to appear. It therefore formed a good 
test of the formatting capability of our system. 

The illustration was prepared in the following 
steps: (1) the straight text was keyboarded on a 
Flexowriter without parameters, codes, or markers 
of any kind; (2) appropriate text boundary and box 
markers were added using a display scope editing 
program to be described in a moment; (3) the text 
with markers was then converted to the Rand stand¬ 
ard format; and (4) the typesetting programs were 
run using this as input. The text editing program 
used in step (2) is implied by the blocks labeled 
“optional editing by scope” in both Figs. 1 and 2. 
The text editor is a general editing program, not 
specific to the typesetting system, but we have 
found it very useful in preparing material for pho¬ 
tocomposition. We shall give only a brief account 


of this program here, since a complete description 
can be found in Bacon. 9 

The text editor program is written for a small 
Digital Equipment Corporation PDP-4 computer 
with 4K words of core storage, a cathode ray tube 
and light pen, a paper tape reader and punch, and a 
teletype keyboard. This small computer is inter¬ 
faced into our IBM 7090 giving it access to the 
tape units, disk file, and core storage of the larger 
computer. The interface was constructed locally by 
Russell Ranshaw of our staff. Input to the text edi¬ 
tor can be keyboarded directly or read from paper 
tape or magnetic tape via the interface. Output can 
be typed, punched on paper tape, or written on 
magnetic tape. 

The text editor continuously displays selected 
sections of text on the cathode ray tube and editing 
functions can be performed on the displayed text 
using the light pen and keyboard. The display is in 
two parts as shown in Fig. 7. Along the bottom of 
the screen, stationary symbols are shown which 
function as push buttons when touched by the light 
pen. The remainder of the screen is used to display 
the text being edited. The size and intensity of the 
characters in the display as well as the vertical and 
horizontal dimensions of the display itself can be 
varied. All of the text held in the computer at one 
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Figure 7. Text editor display. 


time can be caused to move down the face of the 
scope or in the reverse direction with a speed con¬ 
trolled in increments over a wide range of values. 

is movement of text, its direction, and speed are 
controlled by the light pen and “push buttons,” as 
are all input and output functions. A complete list 

of the push-button symbols and their functions is 
' £ lven in the Appendix. 


The light pen can be used to place any one o 
three markers under particular characters in the dis 
play. One of these, the cursor, is used to mark ; 
particular point in the text, while the other two th< 
eft and right delimiters, are used to mark off sec¬ 
tions of text for deletion or movement. The move¬ 
ment of delimited material to the point marked bv 
the cursor or the insertion of material from the key¬ 
board is controlled by the light pen and push-button 
symbols. In our illustration, all of the text boundary 
and box markers were inserted using this program. 
Figure 8 shows this being in our office by a secre¬ 
tary who has had some experience preparing ma¬ 
terial for photocomposition. The text of the illus- 

rr P '"T d *. p '«ow ti ,e, is show"”, 
P ° 4g. 9 and then with the required text 

boundary and box markers inserted at the bottom of 
this figure. 


The next step in the processing of this example 


was to convert the text with its markers to the 
and standard format. We may suppose that this 
was done in order to make it available for distribu¬ 
tion or to use it for some purpose other than type¬ 
setting. A representation of the text in this stand¬ 
ard format is shown in Fig. 10. A proper represen¬ 
tation would consist simply of a long string of 
paired octal digits, but that would not illustrate the 
encoding scheme very well. Here the encoded ma- 
term 1 is shown on two levels. The upper line shows 
that shifts and flags while the lower contains the 
text proper. To the left of each dattum, a text label 
is shown. This six-character label has the text type 
indicator as its first character, while the remaining 
ive characters are specific to the entry. This label 
uniquely identifies the datum. In this figure the 
types indicated are T for title, H for heading, A for 
author, B for body, and C for caption. This infor¬ 
mation could be used in the typesetting system to 
control the course of the typesetting process, but in 
is case, the information is redundant. The flags 
shown are represented as B for Boundary alphabet 
R for Roman alphabet, and P for punctuation al- 
phabet. There are no accepted graphics for the al¬ 
phabet flags, since they are non-Hollerith (non- 
printing) six-bit patterns. In the Roman alphabet, 
shifts are assigned the numerals 1 through 9. The 
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Figure 9. Text before and after editing for the booklet example. 


IAfter]!yourHbabyHhas]larrived...]*NP 

30X1 Magee Womens Hospital / 

Pittsburgh, Pennsylvania 33X1 

*2 Prepared by / 

Barbara Roudabush, R.N. *pt 
Illustrated by / 

Ann Retaichak *2*NP// 

34Your body returns to normal...34 

*5 Through a natural process called Involution, organs altered by 
pregnancy return to normal. *5 

*5 The extra tissues of the uterus and breasts that have built up 
during pregnancy are absorbed by the body. *S 

*5 The doctor will measure the progress of this by pressing 
lightly on your abdomen and saying how many "finger^ widths 
the top of the uterus is above or below the navel. *5*NC 

*36 Positions of the uterus after delivery. *36 


THE LEFT HAND OF SCHOLARSHIP 


Figure 8. Editing text with scope and light pen. 


After your baby has arrived... 

Magee Womens Hospital 
Pittsburgh, Pennsylvania 

Prepared by 

Barbara Roudabush, R.N. 
Illustrated by 
Ann Retaichak 


Your body returns to normal... 


The extra tissues of the uterus and breasts that have built up 
during pregnancy are absorbed by the body. 


Through a natural process called Involution, organs altered by 
pregnancy return to normal. 


The doctor will measure the progress of this by pressing 
lightly on your abdomen and saying how many "linger" widths 
the top of the uterus Is above or below the navel. 


Positions of the uterus after delivery. 
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8 T hrough a natural process called Involution!" organs 



B00003: 


have built up during pregnancy are absorbed by the 


PB 

body .8 


B00004: 


BRl 9 

8 T he doctor will measure the progress of 

this by pressing lightly on your abdomen and saying how 
PR PR 

nany "finger" widths the top of the uterus is above or 
PB 

below the navel.8c — 


BRl 9 

C00001: 9 p ositions of the uterus after delivery!! 

r„Htrndl% R frma e t ntati ° n ° f the t£Xt ° f thC b0 ° klet eXam P' e 

numeral 1 represents a shift to upper case and the 
numeral 9 is a shift terminator. The text boundary 
and box markers appear in the boundary alphabet 
where the character assignments are arbitrary ex¬ 
cept for two characters used to delimit sentences 
and paragraphs. If the parameters had been key¬ 
boarded within the text, they would appear either 

m the Hollerith alphabet or else as text descrip¬ 
tions. * 

In Fig. 11, the parameters for typesetting this 
materia! are shown. The first parameter, $RFORM$, 
tells the program that the input is in standard for¬ 
mat. The general parameters see the page size to 
8X5 inches; the type faces used will be Century 
tahc, Century Bold, and Century Bold Italic; the 
type size is set to 12 points, and tab positions 
are set at 1, 2, 3, 4, and 5 inches from the left 
edge of the page. When the standard format is 
being used, all text boundary and box markers are 
assumed to be single characters in the boundary al¬ 
phabet unless otherwise indicated. In this case, an 
occurrence in the boundary alphabet of an N causes 


TSIZ( I 12)f I TAB(fja,jf4^5) E | ,I '’ Cc ' NT ITAL ' CENT BOLD,CENT BOLD ITAL), 


o AT 

>N 

; skip 

(NP) $ 

a at 

>4 

SKIP 

nl) :i 

:> at 

C| 

SKIP 

nc) ;; 

at 

6i 

SKIP! 

10FT 

$ AT ) 

A( 

C0L(3.75,. 


* MA°B(^) T $° )2( ro.vr (bold ital), tse( 2 4), bskip(. 7 s), 

$ FROM )3( TO )3( TSIZ(8), BSKIP(COT), BTAB(COT) $ 

$ FROM )5( TO JSC TSE(8), BSKIP(BTM), BTAB(ROT) $ 

$ PROM )7( TO )7( PONT (BOLD ITAL) $ 

$ FROM )8( TO )8( UNIT $ 

$ FROM )9( TO )9( TSIZ(8), BSKIP(BTM), BTAB(COT) $ 

Figure 11. Parameters for the booklet example. 

a skip to a new page, a 4 causes a skip to a new 
line, a 6 causes a skip down of 10 points, an A 
causes a two-column format to be initiated (this 
occurs after the second page), and a C causes a skip 
to a new column. Each word on the title page is put 
in a separate box of the same kind. These are set in 
24 point bold italics and successive boxes are 
skipped down % inch and moved to the next tab 
position. On the second page, the name and address 
of the hospital are put in a block and centered on 
the page. The names of the author and illustrator 
are put in a box and placed in the lower right cor¬ 
ner of the page. On page three, the subtitle and 
each of the three paragraphs are treated as boxes 
and equally spaced in the left hand column. In the 
right hand column, space is left for an illustration 
with the caption centered beneath it. Figure 12 
shows these three pages in their final printed form. 

With this illustration, the description of our cur¬ 
rent typesetting system is complete. The typesetting 
system itself, however, is not now complete, nor 
will it be until it is abandoned to a dusty completed 
projects file to rest unused. Some improvements and 
extensions are planned for the coming months, 
while others that seem promising will wait for im¬ 
proved hardware. One improvement, for example, 
will be in the ability of the system to handle tabular 
material, not only tables of numbers or of words, 
but also tables of contents and indices. Then again,’ 
there is no provision in our system for setting com¬ 
plex mathematical expressions. We have, in fact, no 
way to represent such forms in a linear string which 
would allow their efficient reconstruction in two 
dimensions on paper. This problem, however, is not 
of great importance to us, since the equipment we 
have could handle only the simplest formulas. 

As our last example has shown, this system has 
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Figure 12. The first three pages of the postpartum care 
booklet. 

some facility in controlling page format in detail. 
This facility is not as complete nor as easy to use as 
we would like it to be. We can expect some im¬ 
provements in the language we use to specify for¬ 
mats and typesetting operations, but significant im¬ 
provements will wait for new equipment. For type¬ 
setting the kind of material exemplified by our 
booklet illustration, no system will be entirely satis¬ 
factory that does not include a manipulable visual 
display. In such a system, material to be formatted 
would be punched simply as straight text along with 
the general parameters and a few text boundary par¬ 
ameters and associated markers, perhaps only mark- 
ing page boundaries. The system would read the 


text and display one page at a time on a scope. The 
author or editor would then move this material 
about on the face of the scope, changing type size 
and font at will, until the exact format he wants is 
obtained. Then, with a push of a button, the page 
would be written out along with the appropriate 
codes to set it in that form. The display in such a 
system would not have to have high graphic arts 
quality, but the resolution would have to be great 
enough to provide exact point size and letter spacing 
representation for the fonts being used. 

In the beginning of our discussion, we asserted 
that the purpose of our text processing system was 
to extend the abilities of the scholar in performing 
his work. Whatever else a scholar may do, it seems 
essential that he be able to: (1) make accurate ob¬ 
servations, (2) collect, sort out, and understand the 
information in his field, (3) integrate his observa¬ 
tions with current knowledge to produce new infor- 
mation, and finally (4) make this new information 
public. We have been concerned with the last of 
these describing in some detail one particular sys¬ 
tem intended to aid in the publication of informa¬ 
tion. The characteristics of this system are derived 
from our straightforward attempts to use modern 
computing equipment and programming techniques 
to duplicate as well as possible the work that is 
done by printers and publishers. If we are success¬ 
ful, the printed material we produce will be nearly 
as good as that we are trying to duplicate, but done 
much faster. If this is the extent of our own schol¬ 
arly work, then surely we have been unimaginative. 

Imagine a system of publication that has the fol¬ 
lowing characteristics. First, a scholar publishes in 
this system by making his work available on mag¬ 
netic tape. His publication is then “seen” by other 
scholars only when a computer has made the deci¬ 
sion that his work is both pertinent to and impor¬ 
tant for some request for information. We assume 
that the computer’s decisions in these matters is less 
fallable than the scholar’s own. Suppose that there 
are many more subscribers to this system than to 
any current journal and that the coverage available 
is just as broad or as narrow as the interests of any 
individual scholar. Finally suppose that publication 
in this system is nearly immediate. If this system 
were in existence, then there would be no further 
need of scholarly publications in printed form, ex¬ 
cept perhaps for vanity. 

Can the computer do all of this? There are those 
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of us who think it cannot. But what of the scholar? 
Can he continue to function for long when the in¬ 
formation he must collect and sort out and under¬ 
stand expands exponentially? We may be certain 
that the scholar will continue to function on some 
level; that he will continue to generate information. 
The computer can aid in processing this informa¬ 
tion. That we already know. The computer alone 
may not be able to evaluate the importance of a doc¬ 
ument to some line of investigation, but a compu¬ 
ter can hold statistics and the interactions between 
men and computers may easily generate evaluations. 
In our research, we are interested not so much in 
what the computer can do, but rather what the com¬ 
puter and scholar together can do better than either 
can do alone. 
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Appendix 

THE TEXT EDITOR PUSH-BUTTON SYMBOLS 

RUN Causes the text to be set into motion. 

FWD Causes the motion of the text to be from 
bottom to top. 

REV Causes the motion of the text to be from 
top to bottom. 

FAS Accelerates the motion of the text as long 
as the light pen is held on this symbol. 
SLO Decelerates the motion of the text as long 
as the light pen is held on this symbol. 
HLT Halts the motion of the text. 

MAN Causes the text to move when the light pen 

is held on it, if it is otherwise halted, and 
vice versa. 

__C__ Cursor. 

_L__ Left delimiter. 

_R_ Right delimiter. 

The above three symbols control the ability of the 
light pen to move one or another of the underlines. 
The symbols themselves vary, with the middle let¬ 
ters C, L, and R remaining constant. An initial letter 
D shows which of the three underlines may be moved 
by the light pen, and a final letter D or N tells 
whether or not the given underline is defined. The 
cursor is always defined, and hence its symbol always 
appears as CD or DCD. 

TYP Starts typing on the Teletype the block be¬ 
tween the left and right delimiters. 

TYH Causes typing to stop immediately. 

DEL Causes the block between the left and right 
delimiters to be deleted, operates only if the 
left and right delimiters are properly defined. 
MOV Causes the delimited block to be moved to 

the point immediately to the right of the 
cursor. 

CLR Causes the text area to be cleared. 

SPG This function, the symbol pattern generator, 
gives rise to a different display. The entire 
alphabet is displayed and each symbol may 
be selected for change with the light pen 
or the Teletype. An enlarged replica of the 
five-by-seven dot pattern is altered by the 
light pen to create a new pattern. A light 
patch in the upper right corner returns the 
program to the normal text display. 
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? This is the interlock symbol and it has no 
function of its own, but serves to activate 
whatever other function has caused this sym¬ 
bol to be changed. When an interlocked 
function’s symbol is sensed, it is made to 
appear in place of the question mark. When 
this new symbol meets the light pen, the 
function is initiated, and the question mark 
returns. The CLR function and all input/ 
output functions are interlocked in this way. 

IN Reads paper tape and appends the informa¬ 
tion to the end of the text until the text 
storage area is almost full, a stop code, or 
two successive carriage returns have been 
read. 

OUT Causes text to be punched, in Flexowriter 
format, starting at the beginning of the text 
and continuing until the character above the 
cursor has been punched. 

DMP Functions the same as the OUT symbol but 
deletes the text punched. 


BIG Causes the letter patterns themselves to be 

punched and may be used to produce read¬ 
able titles. 

RMT Causes a single 120-character record to be 
read through the 7090 interface. 

WMT Causes a single 120-character record to be 
written through the 7090 interface. 

k^MT Causes the entire text area to be written, 
then cleared, by repetition of the WMT 
function. 

WTM Causes a tape mark to be written, on the 
output tape. 

RWD Rewinds the input and output tape, 7090 
logical tapes 2 and 3. 

SBC Causes input and output tapes to be logi¬ 
cally interchanged. The “SBC” symbol then 
becomes the “SCB” symbol. 

DMR Causes the cursor to be moved to the end 
of the text, executes the DMP function, and 
then the IN function. 





